Accuracy Analysis of Generalized Pronunciation Variant Selection in ASR Systems
نویسندگان
چکیده
Automated speech recognition systems work typically with pronunciation dictionary for generating expected phonetic content of particular words in recognized utterance. But the pronunciation can vary in many situations. Besides the cases with more possible pronunciation variants specified manually in the dictionary there are typically many other possible changes in the pronunciation depending on word context or speaking style, very typical for our case of Czech language. In this paper we have studied the accuracy of proper selection of automatically predicted pronunciation variants in Czech HMM ASR based systems. We have analyzed correctness of pronunciation variant selection in forced alignment of known utterances used as an ASR training data. Using the proper pronunciation variant, more exact transcriptions of utterances were created for further purposes, mainly for the more accurate training of acoustic HMM models. Finally, as the target and the most important application are LVCSR systems, the accuracy of LVCSR results using different levels of automated pronunciation generation were tested.
منابع مشابه
A data-driven method for modeling pronunciation variation
This paper describes a rule-based data-driven (DD) method to model pronunciation variation in automatic speech recognition (ASR). The DD method consists of the following steps. First, the possible pronunciation variants are generated by making each phone in the canonical transcription of the word optional. Next, forced recognition is performed in order to determine which variant best matches th...
متن کاملImproving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules
In this paper, we show that linguistically motivated pronunciation rules can improve phone and word recognition results for Modern Standard Arabic (MSA). Using these rules and the MADA morphological analysis and disambiguation tool, multiple pronunciations per word are automatically generated to build two pronunciation dictionaries; one for training and another for decoding. We demonstrate that...
متن کاملModeling pronunciation variations for non-native speech recognition of Korean produced by Chinese learners
Recognition accuracy for non-native speech is often too low to make practical use of ASR technology in interfaces such as CAPT systems. This paper describes how we adapted Korean ASR system to Chinese speakers for building a Korean CAPT system for L1 Mandarin Chinese learners by modeling pronunciation variations frequently produced by Chinese learners. Based on pronunciation variation rules des...
متن کاملNon-native Pronunciation Variation Modeling for Automatic Speech Recognition
Communication using speech is inherently natural, with this ability of communication unconsciously acquired in a step-by-step manner throughout life. In order to explore the benefits of speech communication in devices, there have been many research works performed over the past several decades. As a result, automatic speech recognition (ASR) systems have been deployed in a range of applications...
متن کاملSpeech is like a box of
Pronunciation variability is present in both native and foreign words. Since pronunciation variability constitutes a problem for automatic speech recognition (ASR) systems, modeling pronunciation variation for ASR has been the topic of various studies. In most studies, modeling pronunciation variation was attempted within the standard framework used in mainstream ASR systems. Given that some as...
متن کامل